Using Fat-trees to Maximize the Number of Processors in a Massively Parallel Computer
نویسندگان
چکیده
We investigate the problem of maximizing the number of processors in a massively parallel computer when the degree of the internal nodes and the diameter of the network are physically constrained. The solution we propose for this problem is a fat-tree with processors at the leaves. The basic building block of this fat-tree is a two-level fat-tree. This two-level fat-tree is obtained >from complete sets of mutually orthogonal Latin Squares. As an application of this approach, we describe a novel interconnection network in which each internal node of the fat-tree is a ring. These rings are constructed using the integrated circuit QR0001 Data Stream Controller Interface, produced by National Semiconductor. Restricted to at most 16 interfaces per ring and to a network diameter of at most four, the resulting interconnection network has 51,984 processors. 1 Introduction Interconnection networks for massively parallel computers have been studied in depth during the past few years 1, 5, 10, 15]. Interconnection networks with a multi-ring topology have not, however, been fully investigated , although they have been found to be well-suited for high-performance parallel architectures 11], and multiple rings have been used to synthesize general topologies 9]. The design of a multi-ring interconnec-tion network must take into account physical limitations on the number of interfaces that can be accommodated on an individual ring and on the diameter of the network. We distinguish between two types of interfaces, those that provide access to a processor and those that act as a switch. The function of a switch is to allow transfer of data from one ring to another. Given the physical limitations of the particular communication medium, the question we ask is: How can multiple rings be connected in a multi-ring interconnection network so as to maximize the number of processors? The issue of how the numbers of switches and rings increase with the number of processors is also examined. We address the above question for interconnection networks, in general, and for a network based on the integrated circuit QR0001 from National Semiconductor, in particular. QR0001 is a high-speed communication switch with a data transmission rate of 180 MBytes/sec. It has been designed to accommodate a maximum of 16 interfaces on a ring with a maximum network diameter of four. The performance of the network depends heavily on the speed of the switches. QR0001 has high speed at low cost and is an option to …
منابع مشابه
Speeding up the Stress Analysis of Hollow Circular FGM Cylinders by Parallel Finite Element Method
In this article, a parallel computer program is implemented, based on Finite Element Method, to speed up the analysis of hollow circular cylinders, made from Functionally Graded Materials (FGMs). FGMs are inhomogeneous materials, which their composition gradually varies over volume. In parallel processing, an algorithm is first divided to independent tasks, which may use individual or shared da...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملMixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver
In this study, turbulent flow around a tube bundle in non-orthogonal grid is simulated using the Large Eddy Simulation (LES) technique and parallelization of fully coupled Navier – Stokes (NS) equations. To model the small eddies, the Smagorinsky and a mixed model was used. This model represents the effect of dissipation and the grid-scale and subgrid-scale interactions. The fully coupled NS eq...
متن کاملMixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver
In this study, turbulent flow around a tube bundle in non-orthogonal grid is simulated using the Large Eddy Simulation (LES) technique and parallelization of fully coupled Navier – Stokes (NS) equations. To model the small eddies, the Smagorinsky and a mixed model was used. This model represents the effect of dissipation and the grid-scale and subgrid-scale interactions. The fully coupled NS eq...
متن کامل